Goto

Collaborating Authors

 test function




Bayesian Quadrature: Gaussian Processes for Integration

Mahsereci, Maren, Karvonen, Toni

arXiv.org Machine Learning

Bayesian quadrature is a probabilistic, model-based approach to numerical integration, the estimation of intractable integrals, or expectations. Although Bayesian quadrature was popularised already in the 1980s, no systematic and comprehensive treatment has been published. The purpose of this survey is to fill this gap. We review the mathematical foundations of Bayesian quadrature from different points of view; present a systematic taxonomy for classifying different Bayesian quadrature methods along the three axes of modelling, inference, and sampling; collect general theoretical guarantees; and provide a controlled numerical study that explores and illustrates the effect of different choices along the axes of the taxonomy. We also provide a realistic assessment of practical challenges and limitations to application of Bayesian quadrature methods and include an up-to-date and nearly exhaustive bibliography that covers not only machine learning and statistics literature but all areas of mathematics and engineering in which Bayesian quadrature or equivalent methods have seen use.



A Additional definitions

Neural Information Processing Systems

We provide the definitions of important terms used throughout the paper. Assumption 2.3 when the demand distribution is exponential. Note that Lemma B.1 implies that In the following result, we show that there exist appropriate constants such that prior distribution satisfies Assumption 2.3 when the demand distribution is a multivariate Gaussian with unknown The proof is a direct consequence of Theorem 3.2, Lemmas B.6, B.7, B.8, B.9, and Proposition 3.2. Theorem 6.19] the prior induced by Assumption 2.2 is a direct consequence of Assumption 2.4 and 2.5 are straightforward to satisfy since the model risk function Lemma B.13. F or a given Using the result above together with Proposition 3.2 implies that the RSVB posterior converges at C.1 Alternative derivation of LCVB We present the alternative derivation of LCVB. We prove our main result after a series of important lemmas.



Appendix AAdditionaltable Table2presentsthenumericalresultsfortheablationstudyinSection4.2

Neural Information Processing Systems

The results of our main method in Section 4.1 is reported in column Main. Testdenotes the variant of using the estimated reward function as the test function when trainingtheMIWω. Thismayberelatedtotheunstable estimation ofKL-dual discussed in Section3.2. Removing rollout data in the policy learning generally leads to worse performance and larger standard deviations. From Eq. (22), the MIWω can be optimized via two alternativeapproaches.(1)Wecan




4b03821747e89ce803b2dac590f6a39b-Supplemental-Conference.pdf

Neural Information Processing Systems

Theimplementation optimizes theacquisition function, andtheposterior mean,bysampling adensegridofpoints,and uses a gradient-based optimizer to further optimize the single best point. Thus, onlyacquisition function setup and acquisition function optimization are considered as part of the runtime. For the synthetic test functions, 100 sampled optimal pairs are used for each acquisition function. GP hyperparameters are marginalized over for these tasks, so an equal number ofoptimal pairs aresampled foreachhyperparameter set. Thehyperparameters are re-sampled onafixedschedule throughout the run.